Inspiration: https://gradientdescending.com/survivor-data-from-the-tv-series-in-r/
Where do players come from?
# tm_shape(usmapdata::us_map) +
# tm_fill() +
# tm_borders()
Who played the most
participations_count <- castaways %>%
group_by(castaway_id, full_name) %>%
summarise(num_participations=n()) %>%
arrange(desc(num_participations))
## `summarise()` has grouped output by 'castaway_id'. You can override using the
## `.groups` argument.
Memorable players - Played at least two seasons or made the jury
## `summarise()` has grouped output by 'castaway_id'. You can override using the
## `.groups` argument.
## Joining, by = c("castaway_id", "full_name")
Players who have been able to play more than once without making it to the jury
Types of challenges
within(challenge_description, rm(challenge_id, challenge_name)) %>%
summarise_each( funs = mean) %>%
sapply(round, 3) * 100
## Warning: `summarise_each_()` was deprecated in dplyr 0.7.0.
## Please use `across()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
## puzzle race precision endurance strength turn_based balance
## 26.9 81.4 20.8 13.0 5.6 14.9 16.1
## food knowledge memory fire water
## 2.6 6.2 2.4 3.7 21.8
Winners confessionals vs rest of people
## `summarise()` has grouped output by 'season_name', 'castaway'. You can override
## using the `.groups` argument.
Il y a 16 types de personnalités. On s’attend à ce que le nombre de personnes dans chaque classe représente autour de 1/16 ie 6.25%
## # A tibble: 2 × 2
## is_introvert count
## <lgl> <int>
## 1 FALSE 26
## 2 TRUE 15
## # A tibble: 3 × 2
## is_introvert count
## <lgl> <int>
## 1 FALSE 423
## 2 TRUE 336
## 3 NA 21
Le nombre de votes n’est pas un bon prédicteur pour déterminer le gagnant
plot_lm(castaways$day, castaways$immunity_idols_won)
castaways$is_winner = ifelse(castaways$result == 'Sole Survivor', TRUE, FALSE)
plot(castaways$day, castaways$immunity_idols_won, col = ifelse(castaways$result == 'Sole Survivor', "green", "black"))
# Show Rating -
Has the show popularity decline/quality? (sort by color)
p <- viewers %>%
ggplot(aes(x=episode_date, y=viewers)) +
geom_area(fill="#69b3a2", alpha=0.5) +
geom_line(color="#69b3a2") +
ylab("Viewers (millions") +
xlab("Date") +
theme_ipsum()
ggplotly(p)
## Warning: Removed 22 rows containing missing values (position_stack).
IMDB Rating